Abstract

In humans, learning depends on the joint contribution of multiple interacting systems — memory (WM), long-term memory (LTM) and reinforcement learning (RL). The present study aims to understand the relative contributions of these systems during learning as well the specific strategies individuals might rely on. Collins (2018) put forward a working memory-reinforcement learning combined model that addresses this question but it largely ignores long-term memory. We built four ACT-R (single-mechanism RL and LTM, and two integrated RL-LTM, meta-learning RL and parameter RL bias models) idiographic learning models using the Collins (2018) stimulus-response association task. Different models provided best-fits (LTM: 63%, RL: 1%, meta-RL: 12%, bias-RL:21% of participants) for individual learners which suggests that irreducible differences in learning and meta-learning strategies exist within individuals. Models predicted learning accuracy and rate, and testing accuracy for subjects in their respective groups.

Objectives

This report describes the four ACT-R models and the learning outcomes produced by the changes in parameters. The report also describes how these models fit behavioral data and details the properties of the best fitting models and parameters. The specific objectives of this project is to test if the RLWM task can be modeled well by a group of pure and combined declarative and RL learning models. After fitting the models to participant data we aim to extract parameters that may explain why and how learning resulted as observed. If the parameters describe individual differences in learning would the parameters predict other behavioral data like working memory capacity and reinforcement learning accuracy?

ACT-R Models

Below are the 4 ACT-R models tested. Note that the bolded names appear through-out this document.

  • RL: Pure RL model based on learning of production utility in ACT-R. learning rate (alpha) and softmax temperature are the only 2 parameters

  • LTM: A declarative model that solely depends on storage and retrieval of stimuli, response and outcome in ACT-R’s declarative memory. This model depends on decay rate, retrieval noise and

  • meta_RL: This is a combined RL - LTM model. Information about trials performed by the RL system is shared and stored in LTM (declarative) for use. An isolated (meta) RL system (a set of productions) learns and determines which sub-system, RL or LTM, is used throughout learning. Which subsystem is preferred depends on the specific set of parameters.

  • biased: This is a combined RL-LTM model. Information about trials performed by the RL system is not shared with the LTM portion of the model. An additional “strategy” parameters specifies a bias towards the RL model at the 20, 40, 60, and 80 percent of learning and test trials.

Approach

The models are fit to behavioral data and the best-fitting model and set of parameters is selected by comparing BIC. The lowest BIC value determines the winning model. To assess the quality of the fit model and parameters RLWM task learning features were compared to the model outcomes. The features of interest are: - Accuracy at the end of learning (accuracy after 12 stimulus presentations) - Accuracy at test - Change in accuracy from end of learning to test - Learning rate - Differences in the learning trajectories of the two set sizes The expectations and outcomes are described below.

Results

Model fits

Of the four models compared, the LTM model fit the most number of participants (61) followed by the biased version of the combined RL-LTM model (11) and the meta-RL combined model in third place (9). The RL only model had only one participant that fit it best (figure 1). This is a slight departure from out expectation that the combined RL-LTM models would fit the majority of participants. As observed, this suggests that most learners simply commit to memory the stimulus response associations.

Figure 1. Counts of fit subjects by model

Figure 1. Counts of fit subjects by model

Within each group (groups formed by preferred model types) of participants, there is only 2 RL best fitting combination of parameter values for the alpha and softmax parameters. For the most popular model, LTM, that fit (61) participants, surprisingly, there were only 14 best fitting parameter-value sets for the spreading activation, retrieval noise and memory decay rate parameters. The biased model was the most diverse at 11 parameter sets for (11) participants. The meta-RL model closely followed the biased model in-terms of diversity of parameter-value sets at 8 parameter-value sets for (9) subjects. Figures 2 and 3 show the medians and ranges of the BIC values that determined that the LTM model is the best fitting model even when only comparing BIC values for the set of parameter-values that fit participants best in each category of models.

Figure 2.

Figure 2.

Figure 2 shows that the LTM model has the lowest BIC values.

Figure 3.

Figure 3.

How consistent are the fits observed above? Given a participants best fit how many of the next best fit parameter sets are in the same model category?

Statistics on where the next best fit occurs for each participant by model
model mean median sd min max
biased 298.18182 192.0 470.524987 3 1620
LTM 42.00000 12.0 48.447910 2 122
metaRL 73.33333 52.0 82.649864 2 229
RL 5.50000 5.5 4.949747 2 9
Figure 4

Figure 4

Large differences in BIC values in the first 2 to 5 are critical to provide good evidence against the second and higher best fit models. Figure 5 below shows the rank ordered differencs between consecutive BIC values. The difference is highest between the first two models but the difference falls short of providing strong evidence that the best fit model is preferred over the second best fit model.
Figure 5

Figure 5

These differences might be slightly different when broken apart by model type. Figure 6 below shows that the LTM model has higher difference than the rest of the models meaning any participant that had fit the LTM model best had had more evidence against the second best model fit compared to best-fit models for other participants.
Figure 6

Figure 6

Out of curiosity, how often is the best fit model selected for the same participant? This would tell us whether or not subsequent fits are only due to changes in parameter values.
figure 6

figure 6

These subjects had second best fit models that came from a differnt model group. X1 to X3 are the BIC differences.
subjects X1 X2 X3 model
6234 6.1143415 1.2425180 2.0800821 LTM
6241 2.9286522 0.8870578 0.5414593 metaRL
15002 3.5360927 0.5729736 0.5206463 metaRL
15005 0.5654471 2.1093758 0.3575417 RL

Assesments of Model fits

Looking at the learning curves for the four models in Figure 4, the differences in learning rates are apparent as are other features like the separation between the two set sizes. In the plot below each data point is the average accuracy, for that number of stimulus presentations, across all parameter combinations. The LTM and RL models predict that an increase in set-size does not diminish learning rate and accuracy. But this analysis washes out the individual differences that could be captured by the diverse set of parameter combinations.

Figure 7.

Figure 7.

Descriptive statistics for models: end of learning accuracy
model setSize N accuracy sd se ci
bias s3 12500 0.8095120 0.1591722 0.0014237 0.0027906
bias s6 12500 0.7602101 0.1323282 0.0011836 0.0023200
LTM s3 125 0.9914667 0.0055098 0.0004928 0.0009754
LTM s6 125 0.9892133 0.0103426 0.0009251 0.0018310
metaRL s3 3125 0.8843605 0.0726357 0.0012993 0.0025477
metaRL s6 3125 0.9063451 0.0735543 0.0013158 0.0025799
RL s3 25 0.7770667 0.1648549 0.0329710 0.0680488
RL s6 25 0.7736667 0.1696749 0.0339350 0.0700384

The panels in figure 8 show the mean accuracy for participant behavioral data. The model lines are averages across parameters for that group only. As we are aiming for an individual differences look at these data, collapsing across so much of this variability is uninformative, as was shown above in figure 4,especially if the differences, once fit to actual behavioral data, indicate large differences in learning outcomes or cogntive faculty diagnostics like working memory capacity. Here, only the best fitting sets of parameter combinations were selected and collapsed. As can be seen in the figure below, the different model types appear to be vastly different and some charateristics of behavioral data have come through, such as the separations of the learning trajectories for the different setsizes in the RL-LTM Biased model fit. It can also be seen that some paramter sets in the LTM model also capture the diffculty associated with increasing set size (solid lines in Fig. 8B). The LTM participants, on average have the highest accuracies for the testing phase in both set sizes but they are nearly indistinguishable from the meta-RL group for accuracy at end of learning. The biased group shows the most separation between the set size 3 and 6 at learningand also lower accuracy at test than LTM. The biased group is negligibly different from the meta-RL group for set size 3 but shows a marked difference at set size 6, closely following the behavioral data.

Figure 8.

Figure 8.

For reference, the group mean of all 83 subjects is shown in figure 9 below.
Figure 9

Figure 9

There are five outcome measures of interest in the RLWM task: accuracy at the end learning, accuracy at test, learning rate characterized as slope estimate for the first 6 trials, the differences in learning of set 3 and set 6 and also the level of preserved learning at test for both set-sizes (test-learn). The following analyses compare the model data with behavioral data in these outcome measures.

Figure 10 below shows accuracy at end of learning and test. The models closely track the behavioral data. Note that the RL group has only two data points.
figure 10

figure 10

2 x 2 setzise by interation(learn vs test) ANOVA table for behavioral data
term df sumsq meansq statistic p.value
setSize 1 0.0000641 0.0000641 0.0048209 0.9446873
iteration 1 1.2908057 1.2908057 97.1443094 0.0000000
setSize:iteration 1 0.1081829 0.1081829 8.1416983 0.0046007
Residuals 328 4.3583024 0.0132875 NA NA
2 x 2 setzise by interation(learn vs test) ANOVA table for model data
term df sumsq meansq statistic p.value
setSize 1 0.0001946 0.0001946 0.0176228 0.8944717
iteration 1 2.2540410 2.2540410 204.1447864 0.0000000
setSize:iteration 1 0.0918060 0.0918060 8.3147207 0.0041923
Residuals 328 3.6215740 0.0110414 NA NA

The models predict learning rate for set size 3 for most of the models (not in the explicit biased model, too few data points in RL to say). But the models predicted learing rate for s6 only in the biased model. See figure 11 below.

Figure 11.

Figure 11.

#> 
#>  Welch Two Sample t-test
#> 
#> data:  estimate by setSize
#> t = 10.149, df = 142.26, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  0.02799973 0.04154511
#> sample estimates:
#> mean in group s3 mean in group s6 
#>       0.11481641       0.08004399
Descriptive stats of model and behavioral learning rate
setSize type model mean se
s3 behav biased 0.1040043 0.0056783
s3 behav LTM 0.1171351 0.0021185
s3 behav metaRL 0.1105820 0.0057589
s3 behav RL 0.1226190 0.0059524
s3 model biased 0.0862078 0.0073645
s3 model LTM 0.1138970 0.0008033
s3 model metaRL 0.1019048 0.0067562
s3 model RL 0.1102857 0.0184762
s6 behav biased 0.0555556 0.0098350
s6 behav LTM 0.0861176 0.0028722
s6 behav metaRL 0.0755732 0.0067367
s6 behav RL 0.0496032 0.0170635
s6 model biased 0.0584329 0.0060944
s6 model LTM 0.1025379 0.0010801
s6 model metaRL 0.0978889 0.0097616
s6 model RL 0.1125000 0.0179286
Figure 12

Figure 12

K-W test one way rank-sum test: s6-s3 learning curve differences by model type for behavioral data
statistic p.value parameter method
23.9086 2.61e-05 3 Kruskal-Wallis rank sum test
K-W test one way rank-sum test: s6-s3 learning curve differences by model type for model data
statistic p.value parameter method
36.58295 1e-07 3 Kruskal-Wallis rank sum test
pairwise post-hoc tests for behav data
group1 group2 p.value
LTM biased 0.0000000
metaRL biased 0.0000012
RL biased 1.0000000
metaRL LTM 1.0000000
RL LTM 0.3315925
RL metaRL 0.1719717
pairwise post-hoc tests for model data
group1 group2 p.value
LTM biased 0.00e+00
metaRL biased 0.00e+00
RL biased 0.00e+00
metaRL LTM 4.57e-05
RL LTM 1.00e+00
RL metaRL 1.00e+00
Figure 13.

Figure 13.

(2)set-size x (2)type(modelorBehav data) x 3(model) anova. RL excluded.
term df sumsq meansq statistic p.value
setSize 1 0.3163195 0.3163195 33.6847822 0.0000000
type 1 0.1245720 0.1245720 13.2656380 0.0003164
model 2 0.1781924 0.0890962 9.4878330 0.0001000
setSize:type 1 0.0005759 0.0005759 0.0613223 0.8045809
setSize:model 2 0.2315704 0.1157852 12.3299363 0.0000070
type:model 2 0.0242137 0.0121068 1.2892529 0.2769400
setSize:type:model 2 0.0030133 0.0015066 0.1604423 0.8518372
Residuals 312 2.9298592 0.0093906 NA NA

It is difficult to assess what the model fits are capturing without examining the specific paramter sets more carefully or deducing if membership in a particular model group predicts some other cognitve or learning aspects of the subjects.A summary of the parameter data follows.
First, for the cohort of subjects

Parameters

Parameter spread

Parameter summary: what is the spread of the parameters across participants in the models?
Figure 14.

Figure 14.

mean and medians of parameter values
variable mean median
alpha 0.150000 0.1500000
egs 0.250000 0.2500000
bll 0.525000 0.5500000
imag 0.300000 0.3000000
ans 0.300000 0.3000000
bias 0.334123 0.3195545

Individual parameter effects on outcomes

Figure 15

Figure 15

Figure 16

Figure 16

Individual plots: